The empirical analysis is based on the joint availability of firm level data from three sources: 1) public US based firms in Compustat, 2) disambiguated patent assignee data from Kogan et al. (2017), the United States Patent and Trademark Office, and the Fung Institute at UC Berkeley (Balsmeier et al. 2018), and 3) the NBER-CES Manufacturing Industry Database (Bartelsman & Gray, 1996). We build firm level patent portfolios by aggregating eventually granted US patents from 1958 (first year of availability of the NBER-CES industry data) through 2011 inclusive (last year of availability of the NBER-CES industry data). As we base our analysis on measures that have no obvious value in case of non-patenting activity or first time patenting activity, we only include firms in the analysis that applied for at least one patent in a given year, and patented at least once in any previous year, taking all patents granted to a given firm back to 1926 into account when calculating a firm’s known classes. Finally, we restrict the sample to firms that we observe at least twice and have non-missing values in any control variable. The final dataset is an unbalanced panel of 24,419 firm year observations on 2,130 firms in 123 manufacturing industries, observed between 1958 and 2011. 

Results were produced with STATA version 14.  

To reproduce the analysis sample and results in the paper you need the following data:

- Compustat data (Download on September 18, 2015 through WRDS): variables: emp (Employees), sale (Sales/Turnover (Net)), xrd (Research and Development Expense), ppent (roperty, Plant and Equipment - Total (Net), gvkey (Global Company Key) – pls use your own version as Compustat does not allow making their data publicly available. 

- Consumer price inflation index from the International Monetary Fund: downloaded on November 6, 2018: https://data.imf.org, file: cpi1913_2017.dta, variable: cpi

- Industry output data comes from the NBER-CES database: download from http://data.nber.org/nberces/, on November 6, 2018, file: output_nber_sic4_2020.dta, variable: output (measured in t-1). We took industries output at SIC 4-digit level, calculated as value added and material costs per industry, deflated by each industries’ shipments deflator ‘piship’ as provided by the NBER-CES database. Dataset: nberces.dta

- GDP data: from https://www.bea.gov/, downloaded on August 14, 2020, variable: gdpgr_chained2012dollars, dataset: gdp.dta

- Patent data1: we start with the extended data till 2019, downloaded on August 9, 2020, from: https://github.com/KPSS2017/Technological-Innovation-Resource-Allocation-and-Growth-Extended-Data. This data provides an updated data series to the CRSP "permno" match following the paper Kogan, L., Papanikolaou, D., Seru, A. and Stoffman, N., 2017. Technological innovation, resource allocation, and growth. Quarterly Journal of Economics, 132(2), pp. 665-712. The paper is available at https://academic.oup.com/qje/article/132/2/665/3076284. We keep only patents for which we have a firm identifier as provided by KPSS. Datasets: KPSS_2019_public.csv + KPSS_2017_public.csv (for fdates)

- Patent data2: Tech class data (uspc) comes from USPTO historical data, downloaded on August 17, 2018 at https://www.uspto.gov/ip-policy/economic-research/research-datasets/historical-patent-data-files. The annual dataset contains counts of in-force and issued patents from 1840 to 2014 by NBER sub-category.  The monthly file contains a monthly count of applications, issued patents, and in-force patents by application status, disposal type (abandoned, issued, or pending), and NBER sub-category from 1981 to 2014.  The monthly_disposal dataset contains counts of application by disposal type for each monthly application cohort by NBER sub-category from 1981 to 2014. The historical_masterfile contains micro-level application, NBER sub-category, and prosecution data on 2.2 million patent applications filed from 1981 to 2014 and 8.9 million patents issued through 2014. Three intermediate files (orders, orders_class, and orders_subclass) used to generate the four datasets are also available for download. A document describing these data is available as: Marco, Alan C. and Carley, Michael and Jackson, Steven and Myers, Amanda F., The USPTO Historical Patent Data Files: Two Centuries of Innovation (June 1, 2015). SSRN working paper, available at http://ssrn.com/abstract=2616724
We used the priority date as provided by the USPTO (prior_dt), if priority date was not available we used the filing date as provide by KPSS 2017. Dataset: historical_masterfile_short.dta

Patent data3: patent inventor data comes from Balsmeier et al. 2018, Machine learning and natural language processing on the patent corpus: Data, tools, and new measures, available at: https://onlinelibrary.wiley.com/doi/abs/10.1111/jems.12259, data for download at: https://doi.org/10.7910/DVN/KPMMPV. Dataset: inventor.geo.assignee.combo.disambig.tsv

- Patent data4: Differentiation between product and process patents comes from Seliger et al. (2020), “EPO-ARP Knowledge Spillovers from Product and Process Inventions in Patents and their Impact on Firm Performance”, variables: nproduct_ind,  nprocess_ind, for data and details see: https://dataverse.harvard.edu/dataverse/product_process_patents. Dataset: prod_proc.dta

 - Patent data5: Appropriability data comes from Cohen, Nelson and Walsh (2000), “Protecting Their Intellectual Assets: Appropriability Conditions and Why U.S. Manufacturing Firms Patent (or Not)” NBER working paper 7552, https://www.nber.org/papers/w7552 , Table1. We merged based on ISIC codes provided in CNW, data: cnw2000t1.dta, variable ‘patents’.

- Patent data6: data on future cites of patents comes from Patentsview, downloaded on September 7, 2020, https://patentsview.org/download/data-download-tables, dataset: ‘uspatentcitation.tsv’ 

 
Do-files:

- an_mbf: main file to reproduce results with analysis sample
- cr_out: creates industry specific output measure
- cr_cyc: creates cyclicality measure
- cr_tp: creates innovative search measure
- cr_g1: produces graph1
- cr_g2: produces graph2
- cr_t5cf: produces columns c-f of table 5
- cr_t9: produces table 9



  
